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ABSTRACT 

We model the extremely massive and luminous lens galaxy in the Cosmic Horseshoe 
Einstein ring system J1004+4112, recently discovered in the Sloan Digital Sky Survey. 
We use the semi-linear method of Warren & Dye (2003), which pixelises the source 
surface brightness distribution, to invert the Einstein ring for sets of parameterised 
lens models. Here, the method is refined by exploiting Bayesian inference to optimise 
adaptive pixelisation of the source plane and to choose between three differently pa- 
rameterised models: a singular isothermal ellipsoid, a power law model and a NFW 
profile. The most probable lens model is the power law with a volume mass density 
p cx r - 1 - 96 ±°- 02 and an ax is ratio of ~ 0.8. The mass within the Einstein ring (i.e., 
within a cylinder with projected distance of ~ 30 kpc from the centre of the lens 
galaxy) is (5.02 ± 0.09) x 10 12 M Q , and the mass-to-light ratio is ~ 30. Even though 
the lens lies in a group of galaxies, the preferred value of the external shear is almost 
zero. This makes the Cosmic Horseshoe unique amongst large separation lenses, as 
almost all the deflection comes from a single, very massive galaxy with little boost 
from the environment. 
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1 INTRODUCTION 

The measurement of galaxy mass distributions using strong 
gravitational lensing is now a well-established process, hav- 
ing found ap plication to several ten s of systems to date (for 
example, see iDve fc Warren 1 120071 . and references therein). 
The main attraction of strong lensing over other methods 
is its insensitivity to the dynamical state of the deflecting 
mass. The main disadvantage is that some features of the lens 
mass distribution, such as the ellipticity, are much more ro- 
bustly constra ined by the modelling th an others, such as the 
radial profile (|Saha fc Williams 112003? ). 

Multiple images of a background source can con- 
strain the radial profile of the lens projected mass 
density only weakly (for example, see the review by 
ISchneider. Kochanek fc W ambsganss 2006). However, some 
of the degeneracy is lifted by the incorporation of extra con- 
straints from the observed veloc ity dispersion profile of the 
lens, a technique first applied bv lSand. Treu & E llis I (120021') 
to th e cluster MS 2137—23 and by iTreu fc Koopmans I 
(2002) to the early type galaxy MG 2016+112 and subse- 
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quen t ly to a number o f systems since (|Koopmans fc Treu I 
l2003l : ISand et al. Il2004 ). 

IDve fc Warreril (120051 ) showed how Einstein ring sys- 
tems, i.e., strong lens systems where an extended source is 
imaged into a complete or near-complete ring, can constrain 
the mass profile of the lens more strongly than systems 
with multiple po int-like images. This w ork used the semi- 
linear method of I Warren fc Dye I l|2003h . so called because 
the problem of finding the best fit lens model and source 
surface brightness distribution is split into a linear inver- 
sion of the source for a given non-linearly parameterised 
lens model. Th e technique has been used by several other 
studies to date (ITreu fc Koopmans l|2004l:|Treu et al. Il2006l : 
iKoopmans et ah II2006I ). IKoopmans I (|2005T ) presented 



i an en- 



hanced version of the method which also reconstructs the 
lens gravitational potential non-parametrically. In addition, 
a Bayesian vers i on of the semi-linear method was developed 
bv lSuvu et al~l l|2006r i. 

In this paper, we apply the semi-linear method to re- 
construct the lens mass profile and source surface bright- 
ness image of the Cosmic Horseshoe Einstein ring system 
J1004+41 12, recently discovered in the Sloan Digital Sky 
Survey bv lBelokurov et al. I (|2007h . This is one of the largest 
and most complete Einstein rings thus far discovered, with a 
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diameter 10" and subtending an angle of ~ 300° . The lens is 
an exceptionally massive Luminous Red Galaxy (LRG) with 
a redshift of 0.44 and a velocity dispersion of ~ 430 kms - , 
estimated from a mediocre signal-to-noise spectrum. The 
source is a star-forming galaxy of BX type, using the nomen- 
clat ure oflsteidel et al.l (120041 ) . with a redshift of 2.379. 

iBelokurov et al."l (|2007t ) already provided some sim- 
ple analysis, by picking out four density knots or max- 
ima in the ring and using techniques from the modelling 
of quadru ply-imaged point sou rces to reconstruct the lens- 
ing mass |Evans fc Witt l|200St ). This modelling threw up a 
number of unresolved questions. First, th ere are more than 
four d ensity maxima in the ring, hence IBelokurov et al."l 
(120071 ) provided a number of possibilities for the mass re- 
construction. Their models were restricted to scale-free, 
isothermal-like mass profiles, though with rather general az- 
imuthal variations. The origin of the additional density max- 
ima in the ring was unclear - they were thought to arise 
from the lensing of more than one source or from higher or- 
der (sextuple) imaging. Second, although the LRG lies in a 
galaxy group, the group's contribution to the lensing deflec- 
tion via external shear was found to be modest. Apparently, 
almost all of the lensing effect is provided by the LRG it- 
self. This is surprising because almost all the known lenses 
with image separations greater than ~ 3" are produced 
by over-dense environments, with a significant lensing en- 
hancement provided by the group or cluster. Third, although 
the visible light distribution of the LRG is nearly circular, 
the mass reconstr uctions were more flatte ned and irregular. 
Fourth, although IBelokurov et al."l (|2007l ) provided models 
that matched the image location, they did not successfully 
reproduce the image magnifications. All this motivates a re- 
turn to the Cosmic Horseshoe, but with a more sophisticated 
ring modelling technique. 

Here, we determine the most probable mass profile for 
the Cosmic Horseshoe lens from three popular models. This 
is done by using a refi nement of the semi-linear method of 
I Warren fc DveT(|2003H . To compare between models, we fol- 
low the te chnique of maximis ing the Bayesian evidence as 
derived by ISuvu et al. I (|2006T ). The layout of this paper is 
as follows. In Section [2j we briefly describe the data. Our 
method of analysis is outlined in Section [3] and applied in 
Section 0] We summarise the findings of this work in Sec- 
tion [5] Throughout this paper, we assume the following cos- 
mological parameters; Ho = 70 kms -1 Mpc -1 , Q m = 0.3, 
fi A = 0.7. 



2 DATA 

The C osmic Horseshoe was discovered by IBelokurov et al. I 
(2007) by searching the Sloan Digital Sky Survey (SDSS) 
for luminous red galaxies with multiple, faint, blue 
companions. The centre of the lens galaxy lies at 
(ll h 48 m 33.15 s , 19°30'3.5"). We refer the reader to this dis- 
covery paper for full details of the data and reduction which 
we briefly outline here. 

Follow-up imaging of the lens system was carried out in 
May 2007 at the 2.5m Isaac Newton Telescope (INT) in La 
Palma. Images were acquired in the wavebands U, g and i 
with the Wide Field Camera. Each image was integrated for 
a total of 600s and reduced with the Cambridge Astronom- 



Parameter 


U 


9 


i 


L l/2 


1.20 ±0.32 


6.9 ±0.3 


61.2 ±0.4 


n 


4.24 ±0.4 


4.71 ±0.12 


5.40 ±0.04 


r (//) 


1.5 ±2.0 


6.1 ±0.6 


3.9 ±0.1 


ds (°) 


85 ±20 


91 ±4 


91 ±1 




0.92 ±0.12 


0.83 ±0.03 


0.88 ± 0.01 



Table 1. Sersic profile parameters fit to the U, g and i band 
data. The normalisation \j\/2 has units of image counts matching 
the observed images in Figure [T] The orientation 9 S is in degrees 
counter-clockwise with respect to the positive y-axis. 



ical Survey Unit INT pipeline (jlrwin fc Lewis 112001m . The 
data in each band are shown in the first row of Figure [1] 

Long-slit spectroscopy of the lens galaxy and arc was 
also carried out in May 2007 at the 6m BTA telescope 
of the Special Astrophysical Observatory (SAO), Nizhnij 
Arkhyz, Russia. Absorption by Ca, H and K in the lens 
spectrum places the le ns galaxy at a redshift of z = 0.44. 
IBelokurov et al. I {2007) estimate a velocity dispersion of the 
lens of 430 ± 50 kms -1 by Gaussian profile fitting to the ab- 
sorption lines. The slit was placed ~ 1" from the centre of 
the lens which, given the seeing of 1.7" and effective radius 
of ~ 2" , means that the spectrum is dominated by flux from 
within the half light radius. Lya emission and absorption 
features in the spectrum of the arc indicate that the source 
lies at a redshift of z = 2.38. 

To remove possible contamination of the ring by flux 
from the lens galaxy, we fitted an elliptical Sersic profile to 
the lens galaxy in each waveband. The fitted profiles were 
subtracted prior to our analysis. The second row in Figure 
[T] shows the lens removed ring image for each waveband. 

Table [1] lists the U, g and i best fit parameters of the 
Sersic profile which has the form 



L 1/2 exp{-.B(n)[(r/r-o) 1/n -l]}. 



(1) 



The parameters n and ro were allowed to vary in the fit 
as well as the axis ratio, q s , (i.e., minor axis divided by major 
axis), orientation, 9 S , a nd the centroid. We use the expres- 
sion for B(n) given by ICiotti fc Bertin I (|l999h . In the fit- 
ting, we convolved each trial surface brightness profile with 
a Gaussian point spread function (PSF) that matched the 
image seeing determined from stars in the field. All three 
fits gave acceptable \ 2 values. Note that the ellipticity and 
position angle of the m ajor axis are in good agr eement with 
the results in Table 1 of iBelokurov et al. I [|2007lh who fitted 
a PSF-convolved de Vaucouleurs profile to the light distri- 
bution. 



3 METHODOLOGY 

3.1 Bayesian Semi-linear Inversion 

The original semi-l inear method wa s derived by 
I Warren fc Dvel (|2003l ). first applied by iDve fc Warrenl 
d2005l) and place d within a Bayesian framework by 
buyu et al. I (I2006T ). We give an outline of the method in 
this section but refer the reader to these publications for 
more comprehensive details. 

The technique assumes a pixelised image and source 
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plane. The term 'semi-linear' refers to the fact that the in- 
version problem can be divided into a set of linear param- 
eters - the surface brightnesses of the source plane pixels 
- and a set of non-linear parameters that define the lens 
model. Generally, the source surface brightness distribution 
must be regularised to ensure that the linear inversion step 
is not mathematically ill-posed (see below). This gives rise 
to an extra non-linear parameter called the regularisation 

weight. 

IWarren fc Dvel (|2003h noted that heavy regularisation 
biases the reconstructed source, in turn biasing the best fit 
lens model. Therefore , instead of applying regularisation, 
iDve fc Warrenl (|2005T ) ensured a well posed linear inversion 
through use of an adaptively gridded source plane. In this 
way, regions of the source plane that are not well constrained 
by the observed ring, i.e., areas of low magnification, are 
gridded with large pixels whilst strongly constrained ar- 
eas of the source plane are more finely gridded. The degree 
to which source pixel sizes depend on the magnification is 
controlled through another non-linear parameter called the 
splitting factor (see below). In addition to ensuring a well 
posed problem, an adaptive grid has the appealing charac- 
teristic that the reconstructed source has a more uniform 
error map. 

A more serious problem with regularisation is that it 
smoothes the reconstructed source, effectively increasing the 
number of degrees of freedom by an amount that cannot 
be satisfactorily quantified. This is especially problematic 
when comparing different lens models, as a fixed regularisa- 
tion weight for one model generally does not give the same 
increase in number of degrees of freedom for another. There- 
fore, when comparing different regularised models, \ 2 is n °t 
a useful statistic. 

In t he Bayesian versio n of the semi-linear method de- 
rived bv lSuvu et al. I l|2006ft . the regularisation weight is set 
automatically by the data. Crucially, the problem of compar- 
ing different lens models is solved by the Bayesian evidence 
which allows models to be objectively ranked as we describe 
below. 

In the present work, we combine the advantages of both 
the Bayesian approach and an adaptive source grid. As well 
as allowing model ranking and regularisation, the Bayesian 
evidence lets the data select the optimal source pixelisation 
by finding the most probable splitting factor. 

In the analysis outlined in the next section, it is helpful 
to keep the regularisation weight and splitting factor seg- 
regated from the linear source surface brightnesses and the 
n on-linear lens model parameter s. Following the terminology 
of lBarnabe fc KoopmansH (|2007r ) , we will refer to these extra 
two non-linear parameters as 'hyperparameters' by virtue of 
their indirect influence on the lens and source. 



3.2 Implementation of the Inversion Method 

The process of establishing the most probable lens parame- 
terisation is split into three levels of inference. In the inner- 
most level, the best fit source surface brightness distribution 
for a given set of lens model parameters and hyperparame- 
ters is determined with a linear inversion step. This proceeds 
as follows: A PSF-smeared image is computed for every 
source pixel. All images are created using unit surface bright- 
ness source pixels. The linear problem of finding the factor 



required to scale each image such that their co-addition best 
fits the observed image gives the b est fit source pixel surf ace 
brightnesses, which as a vector is (jWarren fc Dye 112003ft 

s=(F + AH) _1 c. (2) 

The square matrix F and the vector c have the elements 

F lfc = ^ foifail°j , d = fijdj/a] (3) 



and s is a vector containing the best fit source pixel surface 
brightnesses. Here, dj is the observed flux in image pixel j, 
cjj its error and fij is the flux in pixel j of the image of source 
pixel i for the current lens model. The solution is regularised 
by the square regular isation matrix H, s caled by the regular- 
isation weight A (see lPress et al. I l200ll . and Warren & Dye 
2003). The standard errors of the reconstructed source pix- 
els are given by the diagonal terms of the covariance matrix 
C which is just 



C = (F + AH)" 1 . 



(4) 



In Bayesian terminology, computing the solution for s 
using equation ((2)1 amounts to finding the most likely source 
surface brightness distribution by maximising the posterior 
probability for a given lens model and a given source pixeli- 
sation and regularisation. 

In the second level of inference, the most probable set 
of hyperparameters for a given lens model is determined by 
maximising the Bayesian evidence. The evidence is a prob- 
ability distribution in the lens parameters and hyperparam- 
eters that normalises the Bayesian expression for the pos- 
terior probability. It allows different models t o be ranked 
to fin d the most probable model (see below). ISuvu et al. I 
(2006) derived the evidence, e, for this problem, which in 
our case can be expressed as 



21ne 



E 

3 



d, 



+ In [det(F + AH) 



-In [det(AH)] + As T H s + ^ M 2 ™! ) (5) 



where the summations in j act over all image pixels and the 
summation in i acts over all source pixels. Here, we have as- 
sumed zero covariance between all image pixel pairs. In this 
expression, the first term corresponds to \ 2 an d the fourth 
term re gularises the solution (the term denoted XGl in the 
work of lWarren fc Dve~ll20 03;f) . In this second level, equation 
((2| must be evaluated for every trial set of hyperparameters 
to allow calculation of the evidence via equation ([5j)- 

Finally, in the third and outermost level of inference, the 
most probable lens parameters are determined by maximis- 
ing the evidence obtained from the second level. Formally, to 
rank models, the evidence shoul d first be marginalis ed over 
the hyperparameters. However, ISuvu et al. I l|2006T l noted 
that a reasonable simplification is to approximate the distri- 
bution function of the hyperparameters as a delta function 
so that the maximum of the evidence obtained in the sec- 
ond level can be directly compared between models. We have 
adopted this approximation in the present study. 

In pra ctical terms, the three-level pro cedure can be sim- 
plified. As lBarnabe fc Koopmansl (|2007ft point out, the hy- 
perparameters that maximise the evidence in the second 
level of inference vary only slightly with different trial lens 
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model parameters in the third level. This means that it is 
not necessary to maximise the hyperparameters with every 
trial lens parameter set. Instead, we alternate between vary- 
ing the lens parameters whilst keeping the hyperparameters 
fixed and varying the hyperparameters whilst keeping the 
lens parameters fixed. We start this process by holding the 
hyperparameters (i.e., the regularisation weight and split- 
ting factor) at a large value and varying the lens model. 
This reduces local maxima resulting in a smoother evidence 
surface so that an initial set of lens parameters lying close 
to the global maxim um can be efficiently found (see also 
IWarren fc Dve1l2003l ). 

We note two further practicalities. First, when comput- 
ing x 2 , i-e., the first term in equation ((5j), we carry out the 
sum over pixels contained within an annular mask that sur- 
rounds the ring. The mask is designed to include the image 
of the entire source plane, with minimal extraneous sky. This 
means that only significant image pixels are used, making 
the evidence more sensitive to the model parameters. Sec- 
ond, we use a simulated annealing downhill simplex minimi- 
sation algorithm to minimise —hie given by equation ©. 
We find that a slow exponentially cooled temperature with 
a half-life of ~ 30 iterations works extremely well in finding 
the desired minimum. 



3.3 Adaptive Source Plane Grid 

We adapt ively gri d the source plan e acco rding to the pre- 
script ion given in iDve fc Warrenl (|2005l ) and iDve et al. I 
(2007). In this scheme, smaller pixels are concentrated in 
higher magnification regions where there are stronger con- 
straints per unit area of the source plane. 

The adaptive gridding algorithm starts with a regular 
mesh of large pixels. The average magnification fii of every 
source pixel i is then computed. Those pixels that meet the 
criterion fari ^ s axe then split into quarters, where ri is 
the ratio of the area of pixel i to the area of an image pixel 
and s is the 'splitting factor'. Having finished the initial loop 
through all pixels, the process is repeated for the sub-pixels, 
then for the sub-sub-pixels and so on until all pixels satisfy 
the splitting criterion. 

The procedure is carried out every time the splitting 
factor is varied in the evidence maximisation. Although the 
adaptive grid is dependent on the lens model, we find that 
it does not vary significantly when varying the lens parame- 
ters. Therefore, as we discussed in the previous section, sim- 
ilar to the regularisation weight, we hold the splitting fac- 
tor fixed whilst varying the lens parameters and vice versa, 
alternating until convergence is achieved. Convergence typ- 
icall y occurs afte r only a few alternate loops. 

ISuvu et al. I (120061 ) advocate second order regularisa- 
tion, whereby source surface brightness distributions that 
exhibit strongly varying gradients are penalised more heav- 
ily than those with more gradual gradient changes. Al- 
though this is simple to implement with a regular source 
pixel grid, it is ill-defined on an adaptive grid (one can't 
define a set of co-linear pixel triplets). Instead, we apply 
a form of first order regularisation where strongly varying 
pixel-to-pixel surface brightnesses are penalised. We con- 
struct a matrix that takes the difference between a given 
pixel k and the sum of all neighbouring pixels I weighted by 
Wfe; = (ai/cLk) N exp(—y 2 . l /2a 2 ). Here, a is the source pixel 



area, yu is the separation of the centres of pixels k and I and 
A is a normalisation constant set such that ^ 



Wfei 



and N = 1 when k = I. We set a — 0.2". The matrix w 
here relates to the regularisation matrix H via H = w T w. 
By construction, H is non-singular so that the third term in 
equation ([5]) is always calculable. 

In Section 21 we show how each lens model prefers a 
different value of the splitting factor, and how this couples 
to the regularisation weight. 

3.4 Lens models 

We consider three popular mass profiles to model the distri- 
bution of the total (baryonic and dark) projected lens mass: 

• Singular isothermal ellipsoid (SIE) - This model 
has been widely used in gravitational lensing (see e.g. , 
iKassiola fc Kovner |[l993l ; ISchneider. Ehlers fc Falco Ill992h 
motivated by a wealth of stellar dynamical evidence favour- 
ing the idea that galaxies are nearly isothermal. The pro- 
jected surface mass density follows k — ^(r/lkpc)" 1 , where 
f is the elliptical radius defined by f 2 = x' 2 + q 2 y' 2 . The co- 
ordinates x' and y' are defined on axes aligned with the 
semi-major and semi-minor axes of the ellipse and q is the 
ratio of the minor to the major axis. There are a total of 
five parameters for the SIE model: the normalisation kq, 
the orientation 8, the axis ratio q , and the lens centroid in 
the image plane, (x c ,y c ). 

• Navarro, Frenk & White (NFW) profile - This pro- 



file was introduced by iNavarro. Frenk fc White! (|l996l ) as 
a fit to dark matter halos created in cosmological N- 
body simulations. The lensing propert ies have been dis- 
cussed by a number of authors (see e.g., iBartelmann |[l996l : 
lEvans fc Wilkinson |[l998l : iKeeton II2002T ). It has a projected 
surface mass density given by 

1 - F(x) 



K = Kq- 



X 2 - 1 



(6) 



(x>l) 



(7) 



where x — f/r s and 

f—rl — tan" Vs 2 - 1 

-J— tanh- 1 ./] - ^ (x < 1) 
y'l-x 2 
1 (x = 1) 

The model is described by six parameters, but we vary five 
in the evidence maximisation, keeping the scale radius r„ 
fixed at the value of HOkpc (= 20 " at z = 0.44) This is in 
accordance with the prediction bv lBullock et al. I (|200ll ) for 
a galaxy of similar mass and redshift t o the cosmic hors eshoe 
lens. As has been shown elsewhere i|Dve et al. 1120071 ). the 
lensing properties of the NFW profile depend only weakly on 
the value of r s assumed, with a 10% change in r s giving rise 
to only a ~ 1% change in the best fit model parameters. The 
five parameters varied in the maximisation are therefore: 
lens normalisation ko, orientation 6, axis ratio q, and lens 
centroid in the image plane (x c , y c ). 

• Pow er law (PL) - This family of models was intro- 
duced by IKassiola fc Kovner I (Il993t h The projected surface 
mass density is stratified on concentric ellipses following the 
power- law form k = ko (f/lkpc) 1_Q! . The SIE is the special 
case a = 2. The model has six parameters: lens normalisa- 
tion Ko, orientation 6, axis ratio q, power-law slope a and 
lens centroid in the image plane (x c , y c ). 
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For each model, we maximise the evidence with and 
without an external shear component. The external shear 
adds a further two parameters to each model, a magnitude 
7 and an orientation (9 7 . The deflection angle required in 
the ray tracing has an analytic form for the SIE model, but 
must be numerically evaluated for the NFW an d PL models, 
using the prescription given bv lKeetonl (|2002h . 

We note at this point a common misconcep- 
tion regarding the mass she et degeneracy (e.g., 
iGorenstein. Falco fc Shapiro I Il988l l. The de generacy is 
such that image structures are invariant under the trans- 
formation k> — > 1 — a + an where a is a constant. The 
degeneracy is only applicable to lens models that remain 
self-similar under the transformation. None of the three 
models applied in this paper falls into this category. For 
instance, applying the transformation to the power-law 
model does not produce a new power-law. Inverting the 
argument, this means that no combination of power-law 
parameters can give a model with a homogeneous sheet 
of matter and in this sense, the mass sheet degeneracy is 
eliminated. 



4 RESULTS 

Table [5] shows the maximised parameters for the three lens 
models with and without external shear using the g band 
data. The most probable model is the power law with a slope 
of 1.96 ± 0.02. The evidence ranks the SIE as the next most 
probable model, being only 10% as probable as the power 
law. Finally, the NFW is strongly rejected, being ranked 
> 10~ 10 times less probable than the power law. This is 
perhaps not surprising given that the NFW is derived from 
simulations that neglect the effect of baryons. 

Figure [5] shows the significance of the residuals that re- 
main after subtracting the lensed image of the reconstructed 
source from the observed g band ring for the SIE, PL and 
NFW. The NFW clearly leaves the strongest residuals as one 
would expect from the evidence. The difference between the 
PL and SIE residuals is not obvious upon visual inspection, 
however they differ with an RMS of ~ 5%. 

For the best-fitting PL model, the mass within the 
Einstein ring (i.e., within a cylinder with projected dis- 
tance of ~ 30 kpc from the centre of the lens galaxy) is 
(5.02 ±0.09) x 10 12 M , as much as the entire Local Group. 
Using th e absolute magnitude in the r band of —23.45 com- 
puted bv lBelokurov et al. I (|2007h . the mass-to-light ratio is 
~ 30. 

Figure [3] shows the confidence regions on the hyper- 
parameters (the regularisation weight and splitting factor). 
Each model has its own preferred combination of splitting 
factor and regularisation weight, although they are strongly 
degenerate. Larger splitting factors prefer smaller regulari- 
sation weights because increasing the splitting factor results 
in larger source plane pixels on average which effectively 
regularises the solution more heavily. 

The source reconstructions for the PL model for each 
of the three wavebands are shown in Figure [1] For the i and 
U band reconstructions, we fixed the lens model parameters 
at their most probable values established from the g band 
data, but varied the hyperparameters to maximise the evi- 
dence. In this way, the lens model parameters are set by the 



Param. SIE NFW PL 



k 2.50 ±0.03 0.118 ±0.002 2.30 ± 0.03 

6 46.5 ± 2.7 55.5 ±3.1 49.2 ±3.0 

q 0.76 ±0.03 0.89 ±0.02 0.78 ± 0.03 

x c -0.12" ±0.04" -0.10" ±0.04" -0.11" ±0.04' 

y c 0.05" ± 0.03" 0.04" ± 0.03" 0.02" ± 0.03" 

a 1.96 ±0.02 

In e -4237.7 -4262.7 -4235.4 



Param. SIE+7 NFW±7 PL+7 



ko 2.58 ±0.03 0.116 ±0.002 2.37 ± 0.03 

9 49.8 ±2.7 47.9 ±3.1 50.8 ±3.1 

q 0.81 ±0.02 0.86 ±0.02 0.83 ± 0.02 

x c -0.10" ±0.04" -0.09" ±0.04" -0.11" ±0.04' 

y c 0.03" ± 0.03" 0.04" ± 0.03" 0.04" ± 0.03" 

a 1.95 ±0.02 

7 0.017 ±0.005 0.011 ±0.006 0.020 ± 0.005 

6» 7 38.2 ±9.4 46.1 ±12.4 37.7 ±8.6 

In e -4239.0 -4272.0 -4240.2 



Table 2. Most probable parameters obtained by maximising the 
evidence, e for each model. Parameters are: total mass normal- 
isation Ko (in 10 10 Mq kpc -2 ), orientation in degrees counter- 
clockwise with respect to the positive y-axis 6, the axis ratio q 
(minor axis divided by major axis), lens centroid in the image 
plane in arcseconds offset from the observed light centroid (x c ,y c ) 
and the slope for the PL model a. The top and bottom halves of 
the table respectively correspond to the models without and with 
external shear of magnitude 7 and direction 7 . 

higher signal-to-noise g band data, but the i and U band 
data are allowed to select the splitting factor and regulari- 
sation weight most appropriate to their information content. 

The velocity dispersion, a, implied by the SIE model is 
given by 

a 2 = Ecu r B G (8) 

where Ecr is the critical s urface mass density (see 
ISchneider. Ehlers fc Falco1ll992l . for example) and tb is the 
Einstein radius which relates to the SIE model parameters 
via 

(r E /lkpc) = 2 K0 EcR<r 1/2 • (9) 

This gives = 28.4 kpc corresponding to a velocity disper- 
sion of 496 ± 5 kms -1 , which would make the lens one of the 
the most massive galaxies so far known! Nonetheless, this 
is consistent with the result of G aussian fitting to absorp - 
tion lines in the SAO spectrum bv lBelokurov et al. I (|2007h . 
which yielded an estimate of 430±50kms~ 1 . Although the 
spectrum is modest, there is little doubt that the lens is an 
extreme object - colour and luminosity correlate with veloc- 
ity dispersion and mass, and the lens is in the brightest and 
reddest bins for LRGs. We emphasise tha t the modelling 
both in this paper and in lBelokurov et al. I ()2007l ) does not 
explicitly include a velocity dispersion constraint. Hence, it 
is reassuring that both investigations have come to similar 
conclusions as regards the velocity dispersion of the lens- 
ing galaxy. Furthermore, the consistency between the two 
measurements implies that the stellar orbits in the LRG are 
nearly isotropic. 

The results listed in Table [2] show that the presence 
of external shear is very minor. Furthermore, the evidence 
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Figure 1. Image data and source reconstructions. Reading from left to right, the columns correspond to the g, i then U band data. 
Top row gives observed image. Second row shows the lens subtracted image. Third row is the image of the reconstructed source lensed 
by the most probable lens model (the 'model image'). The annulus shows the masked area over which the \ 2 term is evaluated when 
computing the evidence. Fourth row shows the significance of the residuals left after subtracting the model image from the observed ring 
image shown in the second row. Fifth and sixth rows respectively show the reconstructed source and the source divided by the standard 
errors given by the diagonal terms in the covariance matrix C (see Section 13.21 1. The northern source referred to in the text is that at 
(0.7", 1.4"). Reconstructions for all three bands use the most probable PL lens model established by the g band data. 
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SIE: significance of residuals (g band) NFW: significance of residuals (g band) PL: significance of residuals (g band) 
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Figure 2. Significance of the residuals left after subtracting the lensed image of the reconstructed source from the observed g band ring 
for the SIE, PL and NFW. 
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Figure 3. 68%, 95% and 99.7% confidence regions on the splitting factor and regularisation weight, A, for each of the three lens models 
using the g band data. The contours are based on the Bayesian evidence and show the strong degeneracy between both parameters. Each 
plot is normalised to the maximum evidence for that lens model, indicated by the heavy point. The regularisation weight is scaled such 
that a value A = 1 weights the traces of the matrices F and H in equation l(2} equally. 



ranks all models incorporating shear with a lower probabil- 
ity than their non-sheared equivalent models. The sheared 
models are penalised by introducing an extra two parame- 
ters that do not bring about a significant improvement in 
the fit to the data. 

At first, this seems surprising, as the lens is located 
within a group or loose cluster. With such an enormous im- 
age separation (10") required, it is natural to expect a sig- 
nificant contribution from the environment. Even so, there is 
another telling indication that the environment plays only 
a minor role in the lensing. It was a lready established by 
iKochanek. Keeton fc McLeod I (|200ll ) that the ellipticity of 
an Einstein ring is proportional to the external shear. The 
Cosmic Horseshoe ring is very nearly a perfect circle. This 
suggests that any perturbation from the cluster is minimal, 
as the mismatch between the orientation of the cluster and 
the lensing galaxy would generate shear and hence elliptic- 
ity in the ring. The same point is made in I S aha fc Williams I 
(2003) a narrow spread in images' galactocentric distances 
indicates a small or zero external shear and moderate galaxy 
ellipticity. We conclude that almost all the deflection is in- 
deed provided by one very massive galaxy, with the group 
environment playing a very minor role. 

One curiosity is that, for the best-fitting SIE and PL 
models, the axis ratio of the baryonic and dark matter in 
Table [2] is smaller than the axis ratio of the light distri- 



bution in Table [T] There seem to be two possible resolu- 
tions of this difficulty. First, there is a well-known degener- 
acy between flattening and external shear. In fact, the very 
minor contribution from external shear, as in the models 
in the lower panel of Table [2] is already enough to restore 
the axis ratios to good agreement. Second, it is quite likely 
that the ellipticity of the LRG varies with radius. Although 
much deeper imaging is required to confirm this suggestion, 
there are nonetheless many local examples of giant ellipti- 
cals whose central regions are rather round, but whose outer 
parts are much more elongated. A good example is the nom- 
inally E0 galaxy M 87, for which the ellipticity rises to 0. 4 
in the outer regions l|Weil. Bland-Hawthorn fc Malin II1997m . 
If a similar situation applies to the Cosmic Horseshoe lens 
galaxy, then the photometry of the inner parts may not be 
a good guide to the true shape. This may also provide an 
explanation as to why the position angle of the major axis of 
the best fit light profile is different from the angle preferred 
by lens models. 

Finally, we note from the source reconstructions using 
the g and i band data in the bottom- most panels of Figure [T] 
that there is evidence for two peaks. However, the secondary 
northern source does not appear to be visible in the recon- 
struction from the noisier U band data. This manifests itself 
in the colour composite source shown in Figure [4] The red, 
green and blue channels of this plot are respectively the U, g 
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Figure 4. Colour composite of the reconstructed source. The red, 
green and blue channels in this image are formed respectively from 
the i, g and U band reconstructed sources plotted in the fifth row 
of Figure [l] 

and i source surface brightness maps plotted in the fifth row 
of Figure[T] The northern source has a yellowish-green colour 
owing to the lack of U band flux. Unfortunately, it is impos- 
sible to say whether this is an intrinsic colour variation or 
due to the lower sensitivity of the U band data. Similarly, 
the differing source resolutions between bands prevents a 
clear interpretation of the colour of visible structures. 

The two peaks in the reconstructed source may be 
evidence for substructure or may indicate two sources at 
different redshifts. Figure [5] shows the contributions to 
the ring of the Cosmi c Horseshoe made by each source. 
jBelokurov et al. Il2007h already noted that there were five 
knots or maxima in the flux density along the ring, which 
they labelled A, B, Ci, C2 and D. The southern source is 
mainly responsible for A, Ci, C2 and D, whilst the effect of 
the northern source is to provide the additional maximum 
at B. As the maximum at B is barely discernible in the U 
band image, it is no surprise that the reconstructed source 
in U does not show any bimodality. 
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Figure 5. Images of the northern (top) and southern (bottom) 
reconstructed g band source. The northern ring is formed by imag- 
ing all pixels northwards of the line y = 1.2" in the reconstructed 
g band source shown in Figure [T] and the southern ring from all 
pixels to the s outh of this line. The rin g maxima A to D follow 
the labelling of iBelokurov et al. I l|2007l l. 



5 SUMMARY 

This paper has provided the first models of the Einstein 
ring in the newly discovered Cos mic Horseshoe gr a vitati onal 
lens. The semi- linear method of I Warren fc Dvel (|2003T l. in 
which the source distribution is pixelised, remains the tech- 
nique of choice. For a given parametric model of the lens, 
the inversion of the source is linear. Here, we have ex- 
ploited the refinemen t of adaptive gridding introduced by 
iDve fc Warrenl (I2005T) a nd us ed the Bayesian evidence for- 
mulation of Suvu et al. I (|2006l ) to discriminate between dif- 
ferent parametric models on an equal basis. 

The lens in the Cosmic Horseshoe is a luminous red 
galaxy (LRG) lying in a group or loose cluster. Three differ- 
ent mass distributions were used to model the total luminous 
and dark matter in the lens - namely, an isothermal ellip- 
soid, a Navarro-Frenk- White profile and a power-law ellip- 



soid. The effects of the cluster were represented by external 
shear. At least as judged by Bayesian evidence, a power-law 
ellipsoid without shear provides the best fit. Specifically, the 
mass density falls off like p oc f- 1 - 96 * ' 02 ^ w here f defines 
similar concentric ellipses with axis ratio q ~ 0.8. 

Remarkably, the contribution of the group to the lens- 
ing deflection is minimal, despite the huge image separation 
(10") in the Cosmic Horseshoe. This result is consistent with 
the almost circular nature of the Einstein ring. However, it 
means that almost all the lensing effect is produced by an 
enormous LRG - the velocity dispersion estimated from the 
modelling is ~ 500 kms -1 . This mildly exceeds the velocity 
dispersion of 430±50kms _1 already estimated from a low 
signal-to- noise spectrum by IBelokurov et al.l (|2007l ). The 
lens galaxy appears to be the most massive LRG ever de- 
tected. The source reconstructions using the g and the i band 
data is double-peaked, although that built from the noisier 
U band data is not. Although the nature of the double-peak 
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remains unclear, this result is consistent with the pattern of 
density maxima seen along the ring. 

Large separation lenses are now being routinely discov- 
ered by searches through data from the Sloan Digital Sky 
Survey. These probe a very different regime to the smaller 
separation lenses. Tools such as the ring inversion algorithm 
employed here can play a substantial role in understand- 
ing the distribution of matter to large radii in very massive 
galaxies. 
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